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Abstract 



We show that several versions of Floyd and Rivest's algorithm Select [Comm. 
ACM 18 (1975) 173] for finding the kth. smallest of n elements require at most 
n + min{/c, n — k} + o{n) comparisons on average, even when equal elements occur. 
. This parallels our recent analysis of another variant due to Floyd and Rivest [Comm. 

Q \ ACM 18 (1975) 165-172]. Our computational results suggest that both variants 

perform well in practice, and may compete with other selection methods, such as 
O ■ Hoare's Find or quickselect with median-of-3 pivots. 

Key words. Selection, medians, partitioning, computational complexity. 
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O 1 Introduction 

The selection problem is denned as follows: Given a set X := {xj}™ =1 of n elements, a 
total order < on X, and an integer 1 < k < n, find the kth smallest element of X, i.e., an 
element x of X for which there are at most k — 1 elements Xj < x and at least k elements 
Xj < x. The median of X is the |~n/2]th smallest element of X. 

Selection is one of the fundamental problems in computer science; see, e.g., the refer- 
ences in jDHUZOi] IDoZ99| IDoZOlj and |Knu98l §5.3.3]. Most references concentrate on 
the number of comparisons between pairs of elements made in selection algorithms. In the 
worst case, selection needs at least (2 + e)n comparisons [DoZOlJ, whereas the algorithm 
of jRFP+72j makes at most 5.43n, that of |SPP76j needs 3n + o(n), and that in |])o/99| 
takes 2.95n + o{n). In the average case, for k < \n/2] , at least n + k — 0(1) comparisons 
are necessary |CuM89j . whereas the best upper bound is n + k + 0{n 1 / 2 In 1 / 2 n) |Knu98| 
Eq. (5.3.3.16)]. The classical algorithm Find of |Hoa61j . also known as quickselect, has 
an upper bound of 3.39n + o(n) for k = |~n/2] in the average case |Knu98| Ex. 5.2.2-32], 

which improves to 2.75n + o{n) for median-of-3 pivots t Gru99 ( KMP9l|. 

In practice Find is most popular. One reason is that the algorithms of jRFP+721 
SPP76] are much slower on the average |Mus97llVal00| . whereas (KMP97J adds that other 
methods proposed so far, although better than Find in theory, are not practical because 
they are difficult to implement, their constant factors and hidden lower order terms are 
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too large, etc. It is quite suprising that these references |KMP971 IMus971 EalOOj ignore 
the algorithm Select of |FlR75bj . since most textbooks mention that Select is asymp- 
totically faster than Find. In contrast, this paper shows that Select can compete with 
Find in both theory and practice, even for fairly small values of the input size n. 

We now outline our contributions in more detail. The initial two versions of Select 
|FlR75b| had gaps in their analysis (cf. |Bro76l IPRKT83] . |Knu981 Ex. 5.3.3-24]); the first 
version was validated in |Kiw03bj . and the second one will be addressed elsewhere. This 
paper deals with the third version of Select from |FlR75aj . which operates as follows. 
Using a small random sample, it finds an element v almost sure to be just above the fcth 
if k < n/2, or below the kth if k > n/2. Partitioning X about v leaves mm{k,n — k} + 
o(n) elements on average for the next recursive call, in which k is near 1 or n with high 
probability, so this second call eliminates almost all the remaining elements. 

Apparently this version of Select has not been analyzed in the literature, even in 
the case of distinct elements. We first revise it slightly to simplify our analysis. Then, 
without assuming that the elements are distinct, we show that Select needs at most 
n + min{/c, n — k} + 0(n 2//3 In 1 / 3 n) comparisons on average, with ln 1//3 n replaced by In 1 / 2 n 
for the original samples of |FlR75aj . Thus the average cost of Select reaches the lower 
bounds of 1.5n + o(ri) for median selection and 1.25n + o(n) for selecting an element of 
random rank. For the latter task, Find has the bound 2n + o(n) when its pivot is set to 
the median of a random sample of s elements, with s — > oo, s/n — > oo as n - > oc [MaROlJ; 
thus Select improves upon Find mostly by using k, the rank of the element to be found, 
for selecting the pivot v in each recursive call. 

Select can be implemented by using the tripartitioning schemes of [Kiw03a, §5], which 
include a modified scheme of |BeM93j ; more traditional bipartitioning schemes |Kiw 03a. 
§2] can perform quite poorly in Select when equal elements occur. We add that the 
implementation of |FlR75aj avoids random number generation by assuming that the input 
file is in random order, but this results in poor performance on some inputs of ValOOj; 
hence our implementation of Select employs random sampling. 

Our computational experience shows that Select outperforms even quite sophisticated 
implementations of Find in both comparison counts and computing times. To save space, 
only selected results are reported for the version of [ValOOj, but our experience with other 
versions on many different inputs was similar. Select turned out to be more stable than 
Find, having much smaller variations of solution times and numbers of comparisons. Quite 
suprisingly, contrary to the folklore saying that Select is only asymptotically faster than 
Find, Select makes significantly fewer comparisons even for small inputs (cf. Tab. 17.8)) . 

To relate our results with those of |Kiw03bj . let's call qSelect the quintary method 
of Kiw03bj stemming from [FlR75b, §2.1]. qSelect eliminates almost all elements on 
its first call by using two pivots, almost sure to be just below and above the kth element, 
in a quintary partitioning scheme. Thus most work occurs on the first call of qSelect, 
which corresponds to the first two calls of Select. Hence Select and qSelect share 
the same efficiency estimates, and in practice make similarly many comparisons. However, 
qSelect tends to be slightly faster on median finding: although its quintary scheme is 
more complex, most of its work is spent on the first pass through X, whereas Select first 
partitions X and then the remaining part (about half) of X on its second call to achieve a 
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similar problem reduction. On the other hand, Select makes fewer comparisons on small 
inputs. Of course, future work should assess more fully the relative merits of Select and 
qSelect. For now, the tests reported in |Kiw03a| IKiw03bj and in ^3 suggest that both 
Select and qSelect can compete successfully with refined implementations of Find. 

The paper is organized as follows. A general version of Select is introduced in 
and its basic features are analyzed in JJ1 The average performance of Select is studied in 
SjH A modification that improves practical performance is introduced in ^ Partitioning 
schemes are discussed in £0 Finally, our computational results are reported in 

Our notation is fairly standard. \A\ denotes the cardinality of a set A. In a given 
probability space, P is the probability measure, E is the mean-value operator and P[-|£] is 
the probability conditioned on an event S; the complement of £ is denoted by 8'. 

2 The algorithm Select 

In this section we describe a general version of Select in terms of two auxiliary functions 
s(n) and g(n) (the sample size and rank gap), which will be chosen later. We omit their 
arguments in general, as no confusion can arise. 

Algorithm 2.1. 

Select(X, k) (Selects the kih smallest element of X, with 1 < k < n := \X\) 

Step 1 [Initiation). If n — 1, return x\. Choose the sample size s < n — 1 and gap g > 0. 

Step 2 (Sample selection). Pick randomly a sample S := {yi, . . . ,y s } from X. 

Step 3 (Pivot selection). Let v be the output of Select(S', i v ), where 

_ J min { \ks/n + g],s} if k < n/2, . . 

v - \max{\ks/n-g],l} if k > n/2. K > 

Step 4 (Partitioning). By comparing each element x of X \ S to v, partition X into the 
three sets L := {x G X : x < v}, E := {x G X : x = v } and R := {x G X : v < x}. 

Step 5 (Stopping test). If \L\ < k < \L U E\, return v. 

Step 6 (Reduction). If k < \L\, set X := L, h := \X\ and k := k; else set X := R, 
n := \X\ and k := k — \L U 

Step 7 (Recursion). Return Select(A, fc). 

A few remarks on the algorithm are in order. 

Remarks 2.2. (a) The correctness and finiteness of Select stem by induction from the 
following observations. The returns of Steps 1 and 5 deliver the desired element. At Step 
6, X and k are chosen so that the kth smallest element of X is the kth smallest element 
of X, and n < n (since v X). Also \S\ < n for the recursive call at Step 3. 

(b) When Step 5 returns v , Select may also return information about the positions 
of the elements of X relative to v. For instance, if X is stored as an array, its k smallest 
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elements may be placed first via interchanges at Step 4 (cf. Hence Step 4 need only 
compare v with the elements of X \ S. 

(c) The following elementary property is needed in 21 Let c n denote the maximum 
number of comparisons taken by Select on any input of size n. Since Step 3 makes at 
most c s comparisons with s < n, Step 4 needs at most n — s, and Step 7 takes at most 
with n < n, by induction c n < oo for all n. 

3 Sampling deviations 

In this section we analyze general features of sampling used by Select. Our analysis 
hinges on the following bound on the tail of the hypergeometric distribution established 
in |Hoe63j and rederived shortly in |Uhv79 . 

Fact 3.1. Let s balls be chosen uniformly at random from a set of n balls, of which r 
are red, and r' be the random variable representing the number of red balls drawn. Let 
p := r/n. Then 

P[r' > ps + g] < e- 2g2/s Vg > 0. (3.1) 

Denote by x\ < . . . < x* n and y*<---<y* s the sorted elements of the input set X and 
the sample set S, respectively, so that v = y* v . The following result will give bounds on 
the position of v in the sorted input sequence. 

Lemma 3.2. Suppose i := max{l, min(|~Ks] , s)}, Ji := max{[/tn — gn/s], 1}, and J r := 
min{ \nn + gn/s] , n}, where —g < ks < s + g , 1 < s < n and g > 0. Then: 

(a) P[yZ < x!] < e- 2gl l s if T> \ks\ . 

(b) P[a;i r < y*] < e~ 2 ^ if i < \ks\. 

Proof. Note that —g < us < s + g implies that Ji < n and j r > 1 are well-defined. 

(a) If yj < Xj, at least % samples satisfy yi < x*, where r := max^^^ j. In the 
setting of Fact 13.11 we have r red elements Xj < x*, ps = rs/n and r' > i. Now, 
1 < t < Ji — 1 implies 2 < Ji = \nn — gn/s] < nn — gn/s+ 1, so —rs/n > —us + g. Hence 
i-ps-g>Ks-Ks + g- g = 0, i.e., r' > ps + g. Thus P[y? < a^J < e~ 2g2/s by (j3.1j) . 

(b) If Xj r < y£, s — % + 1 samples are at least x^ +1 with J := m&x x * =x * j. Thus we have 
r := n — J red elements Xj > Xj +l , ps = s — Js/n and r' > s — i + 1. Since i < us + 1 and 
n > ] > jr > Kn + gn/ s, we get s — i + l — ps — g > js/n — ks — g > ks + g — ks — g = 0. 
Hence r' > ps + g and P[x* Jr < $] < P[r' > ps + g] < e~ 292/s by (jHHJ). □ 

We now bound the position of v relative to x%, x* kl and x* kr1 where 

ki := max { \k — 2gn/s] , 1 } and k r := min { \k + 2gn/s] , n } . (3.2) 



Corollary 3.3. (a) P[v < x* k ] < e~ 2g2/s if i v = \ks/n + g] and k < n/2. 
(b) P[x£ r < v] < e- 2g2/s if k< n/2. 
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Table 4.1: Sample size f(n) := n^la 1 ' 3 n and relative sample size 4>{n) := f(n)/n. 



n 


10 3 


10 4 


10 5 


10 6 


5 • 10 6 


10 7 


5 • 10 7 


10 8 


f(n) 


190.449 


972.953 


4864.76 


23995.0 


72287.1 


117248 


353885 


568986 


0(n) 


.190449 


.097295 


.048648 


.023995 


.014557 


.011725 


.007078 


.005690 



(c) P[x* k < v] < e 2g2 / s if i v = \ks/n — g] and k > n/2. 

(d) P[v < x* h ] < e~ 2 3 2 l s if k> n/2. 

(e) If k < n/2, then i v ^ \ks/n + g] iff n < k + gn/s; similarly, if k > n/2, then 
i v ^ \ks/n - g] iff k < gn/s. 

Proof. Use Lem. 13.21 with ks = ks/n + g for (a,b), and ks = ks/n — g for (c,d). D 

4 Average case performance 

In this section we analyze the average performance of Select for various sample sizes. 
4.1 Floyd-Rivest's samples 

For positive constants a and (3, consider choosing s = s(n) and g = g(n) as 

s := mm {\af(n)],n - 1} and g := ((3s hm) 1/2 with f(n) := n 2/3 ln 1/3 n. (4.1) 

This form of g gives a probability bound e _2s,2//s = n~~ 2lS for Cor. 13.31 To get more feeling, 
suppose a = P = 1 and s = f(n). Let <p(n) := f{n)/n. Then s/n = g/s = <p(n) and it 
will be seen that the recursive call reduces n at least by the factor 40(n) on average, i.e., 
4>(n) is a contraction factor; note that 4>(n) « 2.4% for n = 10 6 (cf. Tab. 14. 1|) . 

Theorem 4.1. Let C n k denote the expected number of comparisons made by Select for 
s and g chosen as in f!4.1|) with (3 > 1/6. There exists a positive constant 7 such that 

C nk <n + min{ k, n - k } + 7/(71) VI < k < n. (4.2) 

Proof. We need a few preliminary facts. The function <p[t) := f(t)/t = (lnt/t) 1 / 3 de- 
creases to on [e, 00), whereas f(t) grows to infinity on [2, 00). Let 5 := A((3/a) l/2 . Pick 
n > 3 large enough so that e — 1 < af{n) < n — 1 and e < 5 fin). Let a := a + 1/ f(n). 
Then, by (|4.1j) and the monotonicity of / and 0, we have for n > n 

s<af(n) and f(s)<a<f>{af{n))f(n), (4.3) 

f([Sf(n)\) < f(5f(n)) < 5<f>(5f(n))f(n). (4.4) 
For instance, the first inequality of ()4.3|) yields f(s) < f(af(n)), whereas 

f(af(n)) = acj)(af(n))f(n) < a<f)(af(n))f(n). 
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Also for n > n, we have s = \af(n)~\ = af(n) + e with e G [0,1) in (|4.1j) . Writing 
s = af(n) with a := a + e/ f(n) G [a, a), we deduce from (|4.1|) that 

W* = (/?/«) 1/2 /H < (/5/a) 1 / 2 /^)- (4-5) 

In particular, Agn/s < 5f(n), since 5 := 4(/?/a) 1//2 . Next, (j4.1|l implies 

ne~ 2 ^ s < n 1 -^ = f^n 1 ^ In" 1 / 3 n. (4.6) 

Using the monotonicity of / and <fi, increase n if necessary to get for all n > n 

2a<J)(af(n)) + 5<j){8 f(n)) + 2n~ 2 ' 3 + 2 max { [5f(n)f 3 -^n- 2/ \ n~ 2(i } < 0.95. (4.7) 

By Rem. 12.2( c). there is 7 such that (|4.2j) holds for all n < n; increasing 7 if necessary, 
and using the monotonicity of / and the assumption (3 > 1/6, we have for all n > n 

2a + 25 + hn^~ 2p In" 1 / 3 n + 3 max { 5 l ~ 2p /(n)" 2/3 , n 1 ^ In' 1 / 3 n } < 0.05 7 . (4.8) 

Let n' > n. Assuming (|4.2p holds for all n < n', for induction let n = n' + 1. 

We need to consider the following two cases in the first call of Select. 

Left case: k < n/2. First, suppose the event Si := {x* k < v < x* k } occurs. By the rules 
of Steps 4-6, we have X = L (from x* k < v ), k = k and n := \X\ < k r — 1 (from v < x* kr )\ 
since k r < k + 2gn/s + 1 by (j3.2|) . we get the two (equivalent) bounds 

h<k + 2gn/s and n — k<2gn/s. (4-9) 

Note that if i v = \ks/n + g] then, by Cor. l3.3f a.bL the Boole-Benferroni inequality 
and the choice (|4.1|) . the complement £[ of £\ has P[S[\ < 2e~ 29 I s = 2n~ 2/3 . Second, if 
i v 7^ \hs/n + g~\, then n < k + gn/s (Cor. l3~^T e)) combined with k < n/2 gives n < 2gn/s; 
hence n — k < n < n < 2gn/s implies (J4.9)) . Since also E\ implies (j4.9|) . we have 

P[^] < 2n" 2/3 for ^ := {n- k < 2gn/s}. (4.10) 

case: k > n/2. First, suppose the event £ r := {x* ki < v < x* k } occurs. By the 
rules of Steps 4-6, we have X — R (from v < x* k ), h — k = n — k and n :— \X\ < n — k\ 
(from x\ < v); since ki> k — 2gn/s by ()3.2|) . we get the two (equivalent) bounds 

h<n — k + 2gn/s and k<2gn/s, (4-H) 

using n — k = n — k. If i v = \ks/n — g\ then, by Cor. l3.3f c.dL the complement £' r of £ r 
has Y[£' r ] < 2e- 2 9 2 ' s = 2n~ 2 ^ . Second, if i v ^ [ks/n - g], then k < gn/s (Cor. CTe)) 
combined with /c > n/2 gives n < 2gnj s\ hence k < h < n < 2gn/s implies ()4.11|) . Thus 

P[A' r ] < 2n~ w for A r :=[k< 2gn/s } . (4.12) 

Since k < n - k if k < n/2, n - k < k if k > n/2, (JPjl and (jUIj) yield 

P[B'] < 2n~ 2 P for S := { n < min{ A;, n - k } + 2gn/s } . (4.13) 
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Note that min{k,n — k} < \n/2\ < n/2; this relation will be used implicitly below. 

For the recursive call of Step 7, let s, g and i v denote the quantities generated as in 
(14. lj) and (|2.1|l with n and k replaced by n and k, let -0 be the pivot found at Step 3, and 
let X, n and k correspond to X, h and k at Step 7, so that n := \X\ < h. 

The cost of selecting v and v at Step 3 may be estimated as 

C slv + C Siv < 1.5s + 7 /(s) + 1.5s + 7/(s) < 3s + 2 7 /(s), (4.14) 

since / is increasing and ()4.2j) holds for s < s < n — 1 = n' (cf. 1)4. ip ) from n < n. 

Let c := n — s and c := n — s denote the costs of Step 4 for the two calls. Since 
< c < ri and Ec = E[c|fi]P[fi] + E[c|B']P[B'] < E[c|B] + raPfjB'], by (jHHj) we have 

c + Ec<n-s + min{ fc, n - fc } + 2#n/s + 2n 1_2/3 . (4.15) 

Using ()4.2j) again with n < n, the cost of finishing up at Step 7 is at most 

EC^ < E[1.5n + 7/(n)] = 1.5En + 7 E/(fi). (4.16) 

Thus we need suitable bounds for En and E/(n), which may be derived as follows. 
To generalize ()4.13|) to the recursive call, consider the events 

B:= [h<min{k,h-k} + 2gh/s\ and C := { n < [5f{n)\ } . (4.17) 

By (ETTUj) and 8 n ^ and B n A imply C, since 25-n/s + 2#n/s < Sf(n) by (j43|) 

with h < n and 5 := A{f3/a) 1 ^ 2 . For the recursive call, proceeding as in the derivation of 
(J4.13)) with n replaced by ft = i, k by k, etc., shows that, due to random sampling, 

P[B'\A b h = i}< 2i~ 213 and P[B'\A r , n = i\ < 2r 2/3 . (4.18) 

In the left case of k < n/2, using n < n and P[A'i] < 2n~ 2/3 (cf. (|4.10|) ). we get 

En = E[n|^]P[^] + E[n|^]P[^] < E[h\Ai] + n2n~ w . 

Partitioning Ai into the events T>i := A\ fl {n = z}, i = 0: n — 1 (h < n always), we have 

re-l 

E[n|^] = ^Efnl^jP^IA] < . max^ E[n|^], 

where E[n|Pj] < \_Sf(n)\ if z < L^/( n )J + 1> because n < n always. As for the remaining 
terms, BnAt C C implies P[C'|£>;] < P[£'|£>i] < 2r 2/3 by flUHD, where C := {n < [Sf(n)\} 
and h < n = i when the event occurs, so E[n|2?j] < [5f(n)\ + i2i~ 2/S . Hence 

max E[h\T>i] < \5f(n) \ + max 2i 1 ~ 2(3 } 

i=0:n-l i=[5f(n)]+2:n— 1 

where the final term is omitted if [ ( ^/( 72 )J > n ~ 3; otherwise it is at most 

2max{(L5/(n)J + l) 1 ^, n 1 ^ } < 2 max { ^/(n)" 2 ^, n 1 / 3 "^ In" 1 / 3 n } /(n), 

7 



since maxi=\sf(n)]+i:n 2i 1 ~ 2/S is bounded as above (consider (3 > 1/2, then (3 < 1/2 and use 
Sf(n) < [Sf(n)\ + 1, the monotonicity of / and (j4.6|) for the final inequality). Collecting 
the preceding estimates, we obtain 

En < [Sf(n)\ + 2n 1 ~ 2f3 + 2 max { 5 1 - 2 ' 3 f (n)- 2 ' 3 , n 1/3 ^ In" 1 / 3 n } f(n). (4.19) 

Similarly, replacing n by f(n) in our derivations and using the monotonicity of / yields 

Ef(n)<f([5f(n)\) + 2f(n)n- 2 ^+ max 2/(i)r 2 ^, (4.20a) 

i=\o j [n) \+2:n— 1 



where the final term is omitted if \5f{n)\ > n — 3; otherwise it is at most 

2 mtDi { mwtiW ' w } - 2 max { w(»)] 2/3 - 2 '«" 2/3 . } /(»)■ («° b ) 

To see this, use the monotonicity of / and the fact that for i < n (cf. (14.1)1 ) 

f{i)i- 2 ?/f{n) = i 2 ' z -^n- 2 '\\ni/\nnf^ < i 2 ^ n 2 '*. 

For the right case, replace A\ by A r in the preceding paragraph to get (|4.19|) - (|4.2(J|) . 
Add the costs (l4"T4l . (jUSj) and (OHjl . using (OHjl - ljOTlll . to get 

Cnfc < 3s + 27/(5) + n - s + min{ fc, n - k } + 2^n/s + 2n 1 ~ 2f3 

+ 1.5L5/HJ +3n 1 -^ + 3max{5 1 - 2 ^H- 2/3 ,n 1 / 3 - 2/3 ln- 1 / 3 r2} /(n) 

+ 7/(L*/HJ ) + 2 1 f{n)n~ 2(3 + 2 7 max { [<5/(n)] 2/3 - 2/ V 2/3 , n~ 2/3 } f(n). 
Now, using the bounds (fO |) -(|Q |) . 2#n/s < §5/(n) (cf. (|Ojl) and (|Ojl gives 
Cnfc < w + min{ k,n — k} 



+ 
+ 



2« + 25 + 5^3-2/3 ln -l/3 n + 3max { S l-2fi f ( n y2^ n l/3-2f1 ln -l/3 n } ] /( n ) 

2«0(«/(n)) + 5(j)(5f(n)) + 2n~ 213 + 2 max { [5/(n)] 2/3 - 2/ V 2/3 , n~ 2/3 }] 7/(71). 



By (|4.7|) - (j4.8|) . the two bracketed terms above are at most 0.057/(ra) and 0.957/(n), 
respectively; thus ()4.2|) holds as required. □ 

4.2 Other sampling strategies 

We now indicate briefly how to adapt the proof of Thm I4.l1 to several variations on (|4.1jl : 
a choice similar to (|4.21|) below was used in |FlR75aj . 

Remarks 4.2. (a) Theorem 14.11 remains true for (3 > 1/6 and (j4.1j) replaced by 

s := min{[an 2/3 j ,n - l} , # := (/?slnn) 1/2 and /(n) := n 2/3 ln 1/2 n. (4.21) 

Indeed, using e 3 / 2 — 1 < an 2 / 3 < n — 1, e 3 / 2 < 8f(n), a := a + fr 2 ^ and s = an 2 / 3 with 
<5 G [a, a) yields (fO j) - (|Q)l as before, and ln" 1/2 replaces ln~ 1/3 in (l4~Kl) . (Ol) and flCT|- 
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(b) Theorem 14.11 holds for the following modification of (j4.1j) with e\ > 1 

s := min { \af(n)] , n - 1} and g := ((3s \n ei n) 1/2 with f(n) := ri 2/3 ln e;/3 n. (4.22) 

First, using e ei — 1 < af(n) < n — 1 and e £; < 5/(n) gives ()4.3|) - ()4.5|) as before. Next, 
fix (3 > 1/6. Let (3 n := /31n £l_1 ?2. Increase n if necessary so that Pi > (3 for all i > 
minjri, |~5/(n)~|}; then replace (3 by (3 and ln -1 ^ 3 by ln~ ei//3 in (|4.6jl and below. 

(c) Several other replacements for (j4.1j) may be analyzed as in |Kiw 03b. §§4.1-4.2]. 

(d) None of these choices gives f(n) better than that in (j4.1j) for the bound ()4.2j) . 

We now comment briefly on the possible use of sampling with replacement. 

Remarks 4.3. (a) Suppose Step 2 of Select employs sampling with replacement. Since 
the tail bound (|3.1J) remais valid for the binomial distribution |Chv79[ IHoe63l] . Lemma 
13.21 is not affected. However, when Step 4 no longer skips comparisons with the elements 
of S, —s in (|4.15|) is replaced by 0; the resulting change in the bound on C n k only needs 
replacing 2a in (|4.8|) by 3a. Hence the preceding results remain valid. 

(b) Of course, sampling with replacement needs additional storage for S. However, 
the increase in both storage and the number of comparisons may be tolerated because the 
sample sizes are relatively small. 

4.3 Handling small subfiles 

Since the sampling efficiency decreases when X shrinks, consider the following modifica- 
tion. For a fixed cut-off parameter n cut > 1, let sSelect(X, k) be a "small-select" routine 
that finds the kth smallest element of X in at most C cut < oo comparisons when \X\ < n cut 
(even bubble sort will do). Then Select is modified to start with the following 

Step (Small file case). If n := \X\ < n cut , return sSelect(X, k). 

Our preceding results remain valid for this modification. In fact it suffices if C cnt 
bounds the expected number of comparisons of sSelect(X, k) for n < n cnt . For instance, 
()4.2j) holds for n < n cnt and 7 > C cu t, and by induction as in Rem. l2~2T c) we have C n k < 00 
for all n, which suffices for the proof of Thm 14. ll 

Another advantage is that even small n cut (1000 say) limits nicely the stack space for 
recursion. Specifically, the tail recursion of Step 7 is easily eliminated (set X :— X, k :— k 
and go to Step 0), and the calls of Step 3 deal with subsets whose sizes quickly reach n cut . 
For example, for the choice of (|4.1j) with a = 1 and n cut = 600, at most four recursive 
levels occur for n < 2 31 pa 2.15 • 10 9 . 

5 A modified version 

We now consider a modification inspired by a remark of |Bro76j . For k close to \n/2~\, by 
symmetry it is best to choose v as the sample median with i v = \s/2], thus attempting 
to get v close to x* k instead of x*^ k _ gn ^ s -^ or x*^ k+gn / s y, then more elements are eliminated. 
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Hence we may let 

!\ks/n + g] if k < n/2 — gn/s, 

[s/2] if n/2- gn/s <k < n/2 + gn/s, (5.1) 

\ks/n — g\ if k > n/2 + gn/s. 

Note that (|5.1|) coincides with 1)2.1 j) in the Ze/t case of < n/2 — gn/s and the ng/i£ case 
of k > n/2 + gn/s, but the middle case of n/2 — gn/s < k < n/2 + gn/s fixes i v at the 
median position [s/2]; in fact i v is the median of the three values in ()5.1|) : 

z„ := max { min ( \ks/n + g] , [s/2] ) , \ks/n — g] } . (5.2) 

Corollary 13.31 remains valid for the left and right cases. For the middle case, letting 

ji := max { [n/2 — gn/s] , 1 } and j r := min { [n/2 + gn/s] , n } , (5.3) 

we obtain from Lemma f3.2l with k, = 1/2 the following complement of Corollary 13.31 

Corollary 5.1. P[v < x*] < e~ 2g2/s and P[x* jr < v] < e" 29 " 1 ! 8 if n/2 - gn/s < k < 
n/2 + gn/s. 

Theorem 5.2. Theorem 14.11 holds for Select with Step 3 using (15. 1|) . 

Proof. We only indicate how to adapt the proof of Thm |4~D following (J4.8)) . As noted 
after (|5.ip . the left case now has k < n/2 — gn/s and the right case has k > n/2 + gn/s, 
so we only need to discuss the middle case. 

Middle case: n/2 — gn/s < k < n/2 + gn/s. Suppose the event £ m := < v < x* r } 
occurs (note that P[£J < 2e- 2g2 / s = 2n~ 2 ^ by Cor. El- If X = L then, by the rules of 
Steps 4-6, we have k = k and n < j r — 1; since j r < n/2 + gn/s + 1 by (J5.3)) . we get 
n < n/2 + gn/ s. Hence k > n/2 — gn/s yields n < k + 2gn / s and n — k < 2gn / s as in 
f)Pjl . Next, if X = then h-k = n-k and fc := - |L U J5|, so L U £ = {x G X : 
a; < v } 3 x* t gives k < k — ji. Since < n/2 + gn/s and > n/2 — gn/s by ()5.3|) . we get 
< 2gn/s and n < n — A; + 2gn/s as in ()4.11)) ; further, n < n — ji yields n < n/2 + gn/s. 
Noticing that n/2 — gn/ s < k < n/2 + gn/ s implies n/2 < minjfc, n — k} + gn/s, we have 
n < min{fc, n — k} + 2gn/s in both cases. 

Thus in the middle case we again have (J4.13)) and hence ()4.15|) ; further, by ()4.1Uj) and 
(14.12)1 . the event £ m C Ai U A r is partitioned into £ m fl Ai and fl fl ^4 r . 

Next, reasoning as before, we see that ()4.18)) and hence ()4.19j) - (j4.20j) remain valid in 
the left and right cases, whereas in the middle case we have 

P[&\8m, Ai, h = i]< 2r 2/3 and P[B'\£ m , A[, A r , h = i] < 2z" 2/3 . (5.4) 

In the middle case, En = E[n|£ m ]P[£ m ] +E[n|^]P[£^] is bounded by E[n|£ m ] + 2n 1_2/9 , 
since P[£^] < 2n~ 2lS and n < n always. Next, partitioning £ m into £ m fl Ai and £ m fl 
A\ fl A r , we obtain E[n|£ m ] < max{E[n|£ m , Ai], E[n\£ m , A\, Ar]}, where E[n|£ m , Aj\ and 
E[n|£ m , A[, A r ] may be bounded like E[n|^] and E[n|^4. r ] in the left and right cases to get 
(14.19)1 . Then (j4.2()j) is obtained similarly, and the conclusion follows as before. D 
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6 Ternary partitions 



In this section we discuss ways of implementing Select when the input set is given as an 
array x[l:n]. We employ the following notation. 

Each stage works with a segment x[l: r] of the input array x[l: n], where 1 < / < r < n 
are such that Xi < x\ for i = 1:1 — 1, x r < Xi for i = r + l:n, and the kth smallest 
element of x[l:n] is the (k — I + l)th smallest element of x[/:r]. The task of SELECT is 
extended: given x[l:r] and I < k < r, Select(x, I, r, k, k-, k + ) permutes x[l:r] and finds 
I < k^ < k < k + < r such that Xi < Xk for all Z < i < k-, Xi — x% for all /c_ < i < k + , 
Xi > Xk for all k + < i < r. The initial call is Select(x, 1, n, k, k_, k + ). 

A vector swap denoted by x[a: b] <-> c] means that the first d := min(6+l— a, c—b) 

elements of array x[a: c] are exchanged with its last d elements in arbitrary order if d > 0; 
e.g., we may exchange x a +i <-> x c -i for < i < d, or x a +i <-> x c -d+i+i for < i < d. 



6.1 Tripartitioning schemes 

For a given pivot v := x\ from the array x[l: r], the following ternary scheme |Kiw03a| §5.1] 
partitions the array into three blocks, with x m < v for I < m < a, x m = v for a < m < b, 
x m > v for b < m < r. After comparing the pivot v to x r to produce the initial setup 



X = V 


X < V 


? 


X > V 


X = V 


I 


p i 




j q 


r 



(6.1) 



with i := I and j := r, we work with the three inner blocks of the array 



X = V 


X < V 




X > V 


X = V 


I 


p 


i 3 


Q 


r 



(6.2) 



until the middle part is empty or just contains an element equal to the pivot 



X = V 


X < 


V 


X = V 


X > V 


X = V 


I 


p 


3 




i q 


r 



(»i-e., 3 



1 or j 



(6.3) 

i — 2), then swap the ends into the middle for the final arrangement 

(6.4) 



x < v 



X = V 



X > V 



I 



a 



Scheme A (Safeguarded ternary partition). 

lATl. [Initialize.] Set i :— I, p :— i + 1, j := r and q := j — 1. If v > Xj, exchange Xj <-> Xj 
and set p := i; else if v < Xj, set q := j. 

[A~b. [Increase i until x% > v.] Increase i by 1; then if Xi < v, repeat this step. 

lA~fe. [Decrease j until Xj < v.] Decrease j by 1; then if Xj > v, repeat this step. 



11 



[Exchange.] (Here Xj < v < X{.) If i < j, exchange Xi Xj] then if Xi = v, exchange 
Xi <-> x p and increase p by 1; if Xj = t> , exchange Xj <-> x g and decrease g by 1; return 
tolAb. If i — j (so that Xi = Xj = v), increase i by 1 and decrease j by 1. 

[Cleanup.] Set a :— I + j — p + 1 and b :— r — q + i — 1. Exchange x[l: p — 1] <-> x[p: j] 
and x[z: g] «-> x[g + 1: r]. 

StepEjL ensures that < f < x r , so steps andEB don't need to test whether i < j. 
This scheme makes two extraneous comparisons (only one when i = j at Ell)- Spurious 
comparisons are avoided in the following modification |Kiw03a[ §5.3] of the scheme of 
|HeM93j (cf. |Knu98l Ex. 5.2.2-41]), for which i = j + 1 in 

Scheme B (Double-index controlled ternary partition). 

IbIL. [Initialize.] Set i :— p :— I + 1 and j := q := r. 

iBfe. [Increase i until x^ > v.] If i < j and Xi < v, increase i by 1 and repeat this step. If 
i < j and x« = v, exchange x p Xi, increase p and i by 1, and repeat this step. 

iBfe. [Decrease j until Xj < v.] If i < j and Xj > v, decrease j by 1 and repeat this step. 
If i < j and Xj = v, exchange Xj <-» x q , decrease j and g by 1, and repeat this step. 
If i > j, set j :— % — 1 and go tolBb. 

[Bkt. [Exchange.] Exchange Xi <-> Xj, increase i by 1, decrease j by 1, and return tolBti. 

iBb. [Cleanup.] Set a := I + i — p and b := r — q + j . Swap x[/:p — 1] <-» x[p: j] and 
x[i: q] x[q + 1: r]. 



6.2 Preparing for ternary partitions 



At Step 1, r — I + 1 replaces n in finding s and g. At Step 2, it is convenient to place the 
sample in the initial part of x[l: r] by exchanging x, <-> x i+ran d( r -j) for / < i < r s := Z + s — 1, 
where rand(r — i) denotes a random integer, uniformly distributed between and r — i. 
Step 3 uses i := k — l + 1 and m := r — l + 1 instead of k and n to find the pivot position 



k v 



min { |7 — 1 + is/m + g\ , r s } if i < m/2, 
max { |7 — 1 + is/m — g\ , / } if i > m/2, 



(6.5) 



so that the recursive call of Select(x, I, r s , k v , k v , fc+) produces f := Xk v - 
After f has been found, our array looks as follows 



X < V 


X = V 


X > V ? 


r s + we swap x[k£ + 1: r s ] 


X < V 


X = V 


? X > f 



(6.6) 

x[r s + 1: r] in ()6.6|) to get 

(6.7) 
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If k+ = r s , we use scheme |X]with I replaced by k+ in (cf. and by I in|Xf (cf. 

(|6.3j0 ; for k£ < r s , we set i := p := i + 1, j := f + 1, q := f, omit |X]L and replace /, r 
by I, f in[A]5. Similarly, for scheme El we replace I, r by k^, f inEt, and by I, f inlBh. 

After partitioning I and r are updated by setting I := b+ 1 if a < k, r := a — 1 if k < b. 
If I > r, SELECT may return fc_ := fc + := if Z = r, := r + 1 and A; + := Z — 1 if / > r. 
Otherwise, instead of calling Select recursively, Step 6 may jump back to Step 1, or to 
Step if sSelect is used (cf. ^4.3|) . 

A simple version of sSelect is obtained if Steps 2 and 3 choose v := Xk when r — I + 1 < 
n cut (this choice of |FlR75a| works well in practice, but more sophisticated pivots could be 
tried); then the ternary partitioning code can be used by sSelect as well. 

7 Experimental results 

7.1 Implemented algorithms 

An implementation of Select was programmed in Fortran 77 and run on a notebook 
PC (Pentium 4M 2 GHz, 768 MB RAM) under MS Windows XP. The input set X was 
specified as a double precision array. For efficiency, the recursion was removed and small 
arrays with n < n cut were handled as if Steps 2 and 3 chose v := x^,\ the resulting version 
of sSelect (cf. ^4.31 and I6.2|) typically required less than 3.5n comparisons. The choice of 
(j4.21|) was employed, with the parameters a = 0.5, (3 = 0.25 and n cnt = 600 as proposed 
in |FlR75a| : future work should test other sample sizes and parameters. 

7.2 Testing examples 

As in |Kiw03 bij. we used minor modifications of the input sequences of [ValOOj : 

random A random permutation of the integers 1 through n. 
onezero A random permutation of \n/2\ ones and |_rz/2j zeros, 
sorted The integers 1 through n in increasing order, 
rotated A sorted sequence rotated left once; i.e., (2, 3, ... , n, 1). 
organpipe The integers (1, 2, . . . , n/2, n/2, . . . , 2, 1). 

m3killer Musser's "median-of-3 killer" sequence with n = 4j and k = n/2: 



twofaced Obtained by randomly permuting the elements of an m3killer sequence in po- 
sitions 4|_log 2 n\ through n/2 — 1 and n/2 + 4|_log 2 n\ — 1 through n — 2. 

For each input sequence, its (lower) median element was selected for k := [n/2]. 




2 3 4 
k+1 3 k+3 



k-2 k-l k k+1 
2k -3 k-1 2 4 
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Table 7.1: Performance of Select on randomly generated inputs. 



Sequence 


Size 


Time [msec] 


Comparisons [ 


n] 


Tavg 


^avg 


p 

1 avg 


iVavg 


Pavg 


^ avg 




n 


avg 


max 


min 


avg 


max 


min 




M 


[In n] 


[In n] 




[%n] 


random 


50K 


2 


10 





1 


.66 


1 


.77 


1 


.61 


1.74 


1.65 


0.46 


0.55 


8.33 


2.59 




100K 


3 


10 





1 


.63 


1 


.71 


1 


.55 


1.76 


1.63 


0.60 


0.69 


7.58 


2.12 




500K 


13 


20 


10 


1 


.56 


1 


.61 


1 


.54 


1.36 


1.56 


0.67 


0.74 


8.05 


1.19 




1M 


23 


30 


20 


1 


.52 


1 


.58 


1 


.00 


0.55 


1.52 


0.66 


0.73 


8.32 


0.91 




2M 


4G 


51 


40 


1 


.54 


1 


.56 


1 


.52 


1.22 


1.54 


0.75 


0.82 


8.38 


0.72 




4M 


88 


91 


80 


1 


.53 


1 


.55 


1 


.52 


1.18 


1.53 


0.86 


0.92 


8.22 


0.57 




8M 


172 


181 


160 


1 


.52 


1 


.53 


1 


.51 


1.13 


1.52 


0.92 


0.98 


8.54 


0.44 




16M 


336 


341 


320 


1 


.52 


1 


.53 


1 


.51 


1.06 


1.52 


0.95 


1.01 


8.41 


0.35 


onezero 


50K 


2 


10 





1 


.28 


1 


.51 


1 


.00 


0.00 


1.28 


0.24 


0.18 


1.26 


1.91 




100K 


3 


10 





1 


.25 


1 


.51 


1 


.00 


0.00 


1.25 


0.26 


0.15 


1.20 


1.49 




500K 


15 


20 


10 


1 


.33 


1 


.50 


1 


.00 


0.00 


1.33 


0.29 


0.17 


1.34 


0.93 




1M 


30 


41 


20 


1 


.33 


1 


.50 


1 


.00 


0.00 


1.33 


0.27 


0.15 


1.20 


0.73 




2M 


60 


71 


41 


1 


.30 


1 


.50 


1 


.00 


0.00 


1.30 


0.26 


0.14 


1.29 


0.56 




4M 


109 


131 


90 


1 


.20 


1 


.50 


1 


.00 


0.00 


1.20 


0.22 


0.13 


1.18 


0.41 




8M 


219 


261 


190 


1 


.20 


1 


.50 


1 


.00 


0.00 


1.20 


0.22 


0.13 


1.31 


0.32 




16M 


436 


501 


370 


1 


.25 


1 


.50 


1 


.00 


0.00 


1.25 


0.20 


0.11 


1.21 


0.27 


twofaced 


50K 


1 


10 





1 


.67 


1 


.77 


1 


.59 


1.87 


1.67 


0.47 


0.56 


8.24 


2.63 




100K 


3 


11 





1 


.62 


1 


.7.3 


1 


.56 


1.67 


1.62 


0.60 


0.69 


7.61 


2.11 




500K 


12 


20 


10 


1 


.56 


1 


.59 


1 


.53 


1.23 


1.56 


0.63 


0.71 


8.33 


1.18 




1M 


24 


31 


20 


1 


.55 


1 


.57 


1 


.53 


1.23 


1.55 


0.69 


0.76 


8.22 


0.92 




2M 


45 


51 


40 


1 


.54 


1 


.57 


1 


.52 


1.23 


1.54 


0.78 


0.85 


8.36 


0.73 




4M 


88 


91 


80 


1 


.53 


1 


.54 


1 


.52 


1.17 


1.53 


0.88 


0.94 


8.05 


0.57 




8M 


170 


180 


160 


1 


.52 


1 


.53 


1 


.51 


1.12 


1.52 


0.90 


0.97 


8.51 


0.44 




16M 


332 


341 


320 


1 


.52 


1 


.53 


1 


.51 


1.04 


1.52 


0.96 


1.02 


8.55 


0.35 



7.3 Computational results 

We varied the input size n from 50,000 to 16,000,000. For the random, onezero and 
twofaced sequences, for each input size, 20 instances were randomly generated; for the 
deterministic sequences, 20 runs were made to measure the solution time. 

The performance of Select on randomly generated inputs is summarized in Table I7~T| 
where the average, maximum and minimum solution times are in milliseconds, and the 
comparison counts are in multiples of n; e.g., column six gives C avg /ri, where C avg is the 
average number of comparisons made over all instances. Thus 7 avg := (C avg — 1.5n) + / f(n) 
estimates the constant 7 in the bound (|4.2|) ; moreover, we have C avg ~ £ avg , where L avg 
is the average sum of sizes of partitioned arrays. Further, P avg is the average number of 
Select partitions, whereas iV avg is the average number of calls to sSelect and p avg is the 
average number of sSelect partitions per call; both P avg and iV avg grow slowly with Inn 
(linearly on the onezero inputs). Finally, s avg is the average sum of sample sizes; s avg /n 2//3 
drops from 0.95 for n = 50K to 0.88 for n = 16M on the random and twofaced inputs, 
and oscillates about 0.7 on the onezero inputs, whereas the initial s/n 2 ^ 3 « a = 0.5. 
The results for the random and twofaced sequences are very similar: the average solution 
times grow linearly with n (except for small inputs whose solution times couldn't be 
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Table 7.2: Performance of Select on deterministic inputs. 



Sequence 



sorted 



rotated 



organpipe 



m3killcr 



Size 


Time [msec] 


Comparisons [ 


n] 


% 


ivg 


L. 


ivg 


p 

1 avg 


-^avg 


Pavg 


« avg 


n 


avg 


max 


min 


avg 


max 


min 






N 


[In n] 


[In n] 






[%n] 


50K 


1 


10 





1 


.67 


1 


.76 


1 


.59 


1 


.85 


i 


.66 


0.48 


0.57 


7 


.24 


2.65 


100K 


2 


10 





1 


.62 


1 


.69 


1 


.55 


1 


.70 


i 


.62 


0.60 


0.69 


6 


.76 


2.12 


500K 


8 


10 





1 


.56 


1 


.62 


1 


.53 


1 


.35 


i 


.56 


0.67 


0.74 


7 


.52 


1.19 


1M 


15 


20 


10 


1 


.54 


1 


.58 


1 


.53 


1 


.19 


i 


.54 


0.68 


0.75 


7 


.87 


0.92 


2M 


27 


31 


20 


1 


.54 


1 


.56 


1 


.52 


1 


.23 


i 


.54 


0.74 


0.81 


7 


.61 


0.73 


4M 


51 


61 


40 


1 


.53 


1 


.55 


1 


.52 


1 


.19 


i 


.53 


0.87 


0.93 


7 


.34 


0.57 


8M 


98 


111 


90 


1 


.52 


1 


.53 


1 


.51 


1 


.10 


i 


.52 


0.89 


0.95 


8 


.03 


0.44 


16M 


186 


200 


170 


1 


.52 


1 


.52 


1 


.51 


1 


.04 


i 


.52 


0.95 


1.01 


7 


.99 


0.35 


50K 


1 


10 





1 


.67 


1 


.78 


1 


.59 


1 


.86 


i 


.66 


0.48 


0.57 


9 


.45 


2.64 


100K 


2 


10 





1 


.63 


1 


.73 


1 


.58 


1 


.76 


i 


.63 


0.61 


0.69 


9 


.12 


2.12 


500K 


8 


10 





1 


.56 


1 


.62 


1 


.54 


1 


.39 


i 


.56 


0.65 


0.73 


10 


.03 


1.18 


1M 


15 


20 


10 


1 


.55 


1 


.58 


1 


.53 


1 


.29 


i 


.55 


0.69 


0.76 


9 


.56 


0.92 


2M 


27 


31 


20 


1 


.54 


1 


.55 


1 


.52 


1 


.19 


i 


.54 


0.78 


0.84 


8 


.69 


0.72 


4M 


51 


60 


50 


1 


.53 


1 


.54 


1 


.52 


1 


.18 


i 


.53 


0.87 


0.94 


8 


.92 


0.57 


8M 


98 


111 


90 


1 


.52 


1 


.53 


1 


.51 


1 


.12 


i 


.52 


0.89 


0.96 


9 


.29 


0.44 


16M 


185 


210 


170 


1 


.52 


1 


.53 


1 


.51 


1 


.04 


i 


.52 


0.93 


0.99 


8 


.96 


0.35 


50K 


1 


10 





1 


.67 


1 


.78 


1 


.59 


1 


.94 


i 


.67 


0.45 


0.55 


8 


.21 


2.62 


100K 


3 


10 





1 


.62 


1 


.69 


1 


.57 


1 


.68 


i 


.62 


0.60 


0.69 


7 


.61 


2.11 


500K 


10 


10 


10 


1 


.57 


1 


.60 


1 


.54 


1 


.43 


i 


.56 


0.67 


0.75 


8 


.18 


1.19 


1M 


20 


20 


10 


1 


.55 


1 


.58 


1 


.52 


1 


.24 


i 


.55 


0.70 


0.77 


8 


.21 


0.93 


2M 


37 


41 


30 


1 


.53 


1 


.55 


1 


.52 


1 


.15 


i 


.53 


0.78 


0.85 


8 


.48 


0.72 


4M 


G8 


80 


60 


1 


.53 


1 


.54 


1 


.52 


1 


.13 


i 


.53 


0.84 


0.91 


8 


.21 


0.57 


8M 


130 


150 


120 


1 


.52 


1 


.54 


1 


.51 


1 


.07 


i 


.52 


0.88 


0.94 


8 


.64 


0.44 


16M 


240 


260 


230 


1 


.52 


1 


.53 


1 


.51 


1 


.02 


i 


.52 


0.94 


1.00 


8 


.44 


0.35 


50K 


1 


10 





1 


.67 


1 


.76 


1 


.60 


1 


.89 


i 


.67 


0.47 


0.55 


8 


.82 


2.62 


100K 


4 


10 


Q 


1 


.63 


1 


.71 


1 


.57 


1 


.80 


i 


.63 


0.60 


0.69 


7 


.69 


2.13 


500K 


11 


20 


10 


1 


.57 


1 


.62 


1 


.53 


1 


.44 


i 


.57 


0.66 


0.73 


8 


.61 


1.19 


1M 


20 


20 


20 


1 


.55 


1 


.59 


1 


.52 


1 


.40 


i 


.55 


0.72 


0.79 


8 


.33 


0.93 


2M 


38 


41 


30 


1 


.54 


1 


.56 


1 


.52 


1 


.25 


i 


.54 


0.78 


0.85 


8 


.30 


0.73 


4M 


73 


81 


70 


1 


.53 


1 


.54 


1 


.52 


1 


.28 


i 


.53 


0.87 


0.94 


8 


.22 


0.57 


8M 


137 


150 


130 


1 


.52 


1 


.53 


1 


.51 


1 


.05 


i 


.52 


0.91 


0.97 


8 


.37 


0.44 


16M 


248 


260 


230 


1 


.52 


1 


.52 


1 


.51 





.96 


i 


.52 


0.92 


0.97 


8 


.42 


0.35 



measured accurately), and the differences between maximum and minimum times are 
quite small (and also partly due to the operating system). Except for the smallest inputs, 
the maximum and minimum numbers of comparisons are quite close, and C avg nicely 
approaches the theoretical lower bound of 1.5n; this is reflected in the values of 7 avg . The 
results for the onezero inputs essentially average two cases: the first pass eliminates either 
almost all or about half of the elements. 

Table E21 exhibits similar features of Select on the deterministic inputs. The results 
for the sorted and rotated sequences are very similar, whereas the solution times on the 
organpipe and m3killer sequences are between those for the sorted and random sequences. 

The results of Tabs. !7.1H7.2l were obtained with scheme |X] of ^6.21 to save space, Table 
17.31 gives only selected results for scheme El whereas Table 17.31 presents results for the 
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Table 7.3: Performance of Select with ternary scheme iBl 



Sequence 


Size 


Time [msec] 


Comparisons 


[n] 


Tavg 




p 

avg 


N 

1 v avg 


Pavg 


S avg 




n 


avg 


max 


min 


avg 


max 


min 




[n] 


[Inn] 


[In n] 




[%n] 


random 


2M 


43 


51 


40 


1.53 


1.54 


1.52 


1.02 


1.53 


0.76 


0.83 


8.31 


0.72 




4M 


93 


101 


90 


1.53 


1.55 


1.52 


1.09 


1.53 


0.85 


0.92 


8.42 


0.57 




8M 


177 


190 


170 


1.52 


1.54 


1.51 


1.03 


1.52 


0.87 


0.93 


8.15 


0.44 




16M 


343 


350 


340 


1.51 


1.53 


1.51 


0.88 


1.51 


0.91 


0.97 


8.50 


0.35 


onezero 


2M 


82 


91 


70 


1.30 


1.50 


1.00 


0.00 


1.30 


0.26 


0.14 


1.29 


0.56 




4M 


149 


180 


130 


1.20 


1.50 


1.00 


0.00 


1.20 


0.22 


0.13 


1.18 


0.41 




8M 


304 


351 


270 


1.20 


1.50 


1.00 


0.00 


1.20 


0.22 


0.13 


1.31 


0.32 




16M 


621 


711 


531 


1.25 


1.50 


1.00 


0.00 


1.25 


0.20 


0.11 


1.21 


0.27 


sorted 


2M 


23 


30 


20 


1.54 


1.55 


1.52 


1.18 


1.54 


0.78 


0.85 


7.61 


0.72 




4M 


43 


50 


40 


1.53 


1.54 


1.51 


1.18 


1.53 


0.86 


0.92 


7.76 


0.57 




8M 


82 


90 


80 


1.52 


1.53 


1.51 


1.10 


1.52 


0.89 


0.95 


8.01 


0.44 




16M 


156 


160 


150 


1.52 


1.53 


1.51 


1.04 


1.52 


0.97 


1.03 


8.12 


0.35 



hybrid scheme I of |Kiw03at §5.6], which combines some features of schemes El and El The 
hybrid scheme is quite competitive, although slower than scheme Elon the onezero inputs. 

The preceding results were obtained with the modified choice (|5.1j) of i v . For brevity, 
Table 1731 gives results for Select with scheme 1X1 and the standard choice ()2.1|) of i v on 
the random inputs only, since these inputs are most frequently used in theory and practice 
for evaluating sorting and selection methods. The modified choice typically requires fewer 
comparisons for small inputs, but its advantages are less pronounced for larger inputs. A 
similar behavior was observed for Select with scheme IBl 

For comparison, Table 17.61 extracts from Kiw03b some results of qSelect for the 
samples (14. 1}) . As noted in JQ qSelect is slightly faster than Select on larger inputs 
because most of its work occurs on the first partition (cf. L avg in Tabs. 17.11 and l7.6|) . In 
Table rm we give corresponding results for RlSELECT, a Fortran version of the algorithm of 
[ValOQj. For these inputs, riSelect behaves like Find with median-of-3 pivots (because 
the average numbers of randomization steps, iV rn d, are negligible); hence the expected 
value of C avg is of order 2.75n |KMP97| . 

Our final Table 17~B1 shows that Select beats its competitors with respect to the num- 
bers of comparisons made on small random inputs (100 instances for each input size n). 

Our computational results, combined with those in [Kiw03al IKiw03bj . suggest that 
both Select and qSelect may compete with Find in practice. 

Acknowledgment. I would like to thank Olgierd Hryniewicz, Roger Koenker, Ronald 
L. Rivest and John D. Valois for useful discussions. 
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Table 7.4: Performance of Select with the hybrid scheme of [Kiw03a, §5.6]. 



Sequence 


Size 


Time [msec] 


Comparisons 


[n] 


Tavg 




p 

J avg 


N 

1 v avg 


Pavg 


^ avg 




n 


avg 


max 


min 


avg 


max 


min 




[n] 


[Inn] 


[Inn] 




[%n] 


random 


2M 


44 


50 


40 


1.53 


1.54 


1.52 


1.03 


1.53 


0.76 


0.83 


8.31 


0.72 




4M 


86 


100 


80 


1.53 


1.55 


1.52 


1.10 


1.53 


0.85 


0.92 


8.42 


0.57 




8M 


163 


171 


160 


1.52 


1.54 


1.51 


1.03 


1.52 


0.87 


0.93 


8.15 


0.44 




16M 


317 


321 


310 


1.51 


1.53 


1.51 


0.88 


1.51 


0.91 


0.97 


8.50 


0.35 


onezero 


2M 


74 


80 


70 


1.30 


1.50 


1.00 


0.00 


1.30 


0.26 


0.14 


1.29 


0.56 




4M 


141 


151 


130 


1.20 


1.50 


1.00 


0.00 


1.20 


0.22 


0.13 


1.18 


0.41 




8M 


285 


301 


270 


1.20 


1.50 


1.00 


0.00 


1.20 


0.22 


0.13 


1.31 


0.32 




16M 


578 


621 


541 


1.25 


1.50 


1.00 


0.00 


1.25 


0.20 


0.11 


1.21 


0.27 


sorted 


2M 


23 


30 


20 


1.54 


1.55 


1.52 


1.18 


1.54 


0.78 


0.85 


7.61 


0.72 




4M 


42 


50 


40 


1.53 


1.54 


1.51 


1.19 


1.53 


0.86 


0.92 


7.76 


0.57 




8M 


80 


80 


80 


1.52 


1.53 


1.51 


1.11 


1.52 


0.89 


0.95 


8.01 


0.44 




16M 


153 


170 


150 


1.52 


1.53 


1.51 


1.04 


1.52 


0.97 


1.03 


8.12 


0.35 



Table 7.5: Performance of Select with the standard choice of i 



Sequence 


Size 


Time [msec] 


Comparisons 


[n] 


7avg 


^avg 


p 

1 avg 


-^avg 


Pavg 


^avg 




n 


avg 


max 


min 


avg 


max 


min 




M 


[In n] 


[In n] 




[%n] 


random 


50K 


4 


10 





1.83 


1.97 


1.74 


3.73 


1.83 


0.57 


0.67 


8.49 


2.96 




100K 


4 


10 





1.73 


1.83 


1.61 


3.13 


1.73 


0.73 


0.82 


7.80 


2.32 




500K 


14 


20 


10 


1.65 


1.69 


1.61 


3.25 


1.65 


0.82 


0.90 


8.40 


1.30 




1M 


25 


30 


20 


1.61 


1.65 


1.58 


2.83 


1.60 


0.89 


0.97 


8.28 


0.99 




2M 


46 


50 


40 


1.59 


1.61 


1.56 


2.92 


1.59 


0.99 


1.06 


8.01 


0.77 




4M 


90 


100 


80 


1.56 


1.58 


1.54 


2.61 


1.56 


1.15 


1.22 


8.34 


0.60 




8M 


174 


181 


170 


1.55 


1.57 


1.54 


2.70 


1.55 


1.21 


1.27 


8.09 


0.47 




16M 


341 


351 


330 


1.54 


1.56 


1.53 


2.68 


1.54 


1.21 


1.28 


8.33 


0.36 



Table 7.6: Performance of quintary qSelect on random inputs. 



Sequence 


Size 


Time [msec] 


Comparisons 


[n] 


7avg 


^avg 


p 

1 avg 


-^avg 


Pavg 


^avg 




n 


avg 


max 


min 


avg 


max 


min 




[n] 


[Inn] 


[In n] 




[%n] 


random 


50K 


3 


10 





1.81 


1.85 


1.77 


5.23 


1.22 


0.46 


1.01 


7.62 


4.11 




100K 


4 


10 





1.72 


1.76 


1.65 


4.50 


1.15 


0.45 


0.99 


8.05 


3.20 




500K 


13 


20 


10 


1.62 


1.63 


1.60 


4.14 


1.08 


0.59 


1.27 


7.59 


1.86 




1M 


24 


30 


20 


1.59 


1.60 


1.57 


3.93 


1.06 


0.64 


1.35 


8.18 


1.47 




2M 


46 


50 


40 


1.57 


1.58 


1.56 


3.73 


1.04 


0.76 


1.59 


7.67 


1.16 




4M 


86 


91 


80 


1.56 


1.56 


1.55 


3.61 


1.03 


0.94 


1.94 


7.21 


0.91 




8M 


163 


171 


160 


1.54 


1.55 


1.54 


3.45 


1.03 


0.98 


1.99 


7.45 


0.72 




16M 


316 


321 


310 


1.53 


1.54 


1.53 


3.44 


1.02 


0.99 


2.02 


7.55 


0.57 
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Table 7.7: Performance of RlSELECT on random inputs. 



Sequence Size Time [msec] Comparisons [n] L avg P avg AT rnt j 



n 


avg 


max 


min 


avg 


max 


min 


[In n] 


[n] 




random 50K 


2 


10 





3.10 


4.32 


1.88 


3.10 


1.63 


0.45 


100K 


4 


10 





2.61 


4.19 


1.77 


2.61 


1.60 


0.20 


500K 


17 


20 


10 


2.91 


4.45 


1.69 


2.91 


1.57 


0.25 


1M 


33 


41 


20 


2.81 


3.79 


1.84 


2.81 


1.57 


0.40 


2M 


62 


90 


40 


2.60 


3.57 


1.83 


2.60 


1.61 


0.35 


4M 


135 


191 


90 


2.86 


4.38 


1.83 


2.86 


1.65 


0.55 


8M 


249 


321 


190 


2.60 


3.48 


1.80 


2.60 


1.58 


0.40 


16M 


553 


762 


331 


2.99 


4.49 


1.73 


2.99 


1.58 


0.40 



Table 7.8: Numbers of comparisons per element made on small random inputs. 



Size 




1000 


2500 


5000 


7500 


10000 


12500 


15000 


17500 


20000 


25000 




avg 


2.48 


2.06 


1.93 


1.87 


1.81 


1.79 


1.77 


1.76 


1.74 


1.71 


Select 


max 


4.25 


3.03 


2.28 


2.22 


2.09 


2.05 


1.95 


1.93 


1.93 


1.93 




min 


1.55 


1.06 


1.03 


1.64 


1.62 


1.61 


1.64 


1.63 


1.59 


1.60 




avg 


2.86 


2.55 


2.24 


2.16 


2.07 


2.03 


1.98 


1.98 


1.94 


1.90 


qSelect 


max 


3.97 


3.55 


2.57 


2.38 


2.28 


2.21 


2.16 


2.13 


2.11 


2.31 




min 


2.29 


1.97 


1.98 


1.95 


1.87 


1.86 


1.82 


1.83 


1.82 


1.75 




avg 


2.72 


2.85 


2.66 


2.71 


2.72 


2.83 


2.78 


2.75 


2.75 


2.84 


riSelect 


max 


4.40 


4.51 


4.69 


4.43 


4.62 


4.76 


4.64 


4.40 


5.10 


4.77 




min 


1.68 


1.83 


1.75 


1.59 


1.70 


1.77 


1.78 


1.67 


1.90 


1.71 
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